Goto

Collaborating Authors

 learning diverse and discriminative representation


Learning Diverse and Discriminative Representations via the Principle of Maximal Coding Rate Reduction

Neural Information Processing Systems

To learn intrinsic low-dimensional structures from high-dimensional data that most discriminate between classes, we propose the principle of {\em Maximal Coding Rate Reduction} ($\text{MCR}^2$), an information-theoretic measure that maximizes the coding rate difference between the whole dataset and the sum of each individual class. We clarify its relationships with most existing frameworks such as cross-entropy, information bottleneck, information gain, contractive and contrastive learning, and provide theoretical guarantees for learning diverse and discriminative features. The coding rate can be accurately computed from finite samples of degenerate subspace-like distributions and can learn intrinsic representations in supervised, self-supervised, and unsupervised settings in a unified manner. Empirically, the representations learned using this principle alone are significantly more robust to label corruptions in classification than those using cross-entropy, and can lead to state-of-the-art results in clustering mixed data from self-learned invariant features.


Review for NeurIPS paper: Learning Diverse and Discriminative Representations via the Principle of Maximal Coding Rate Reduction

Neural Information Processing Systems

Weaknesses: - My main concern is that, I don't see the benefits of modeling the data as a union of subspaces, where each subspace corresponds to a class, when the representation space is *learned*. In particular, since these subspaces won't be orthogonal in practice, on real data. In an unsupervised setting, to recover the subspaces, one needs to perform subspace clustering, which is a hard problem and computationally expensive to perform. In stark contrast, a linear head trained with a cross-entropy loss learns a representation space with approximately linearly separable regions for each class. As a consequence, classification is simple (linear) and Lp distances in representation space are meaningful (which is not necessarily the case when the classes lie on a union of subspaces). However, there are many other methods which can make neural networks with linear classification head more robust, for example [c].


Learning Diverse and Discriminative Representations via the Principle of Maximal Coding Rate Reduction

Neural Information Processing Systems

To learn intrinsic low-dimensional structures from high-dimensional data that most discriminate between classes, we propose the principle of {\em Maximal Coding Rate Reduction} ( \text{MCR} 2), an information-theoretic measure that maximizes the coding rate difference between the whole dataset and the sum of each individual class. We clarify its relationships with most existing frameworks such as cross-entropy, information bottleneck, information gain, contractive and contrastive learning, and provide theoretical guarantees for learning diverse and discriminative features. The coding rate can be accurately computed from finite samples of degenerate subspace-like distributions and can learn intrinsic representations in supervised, self-supervised, and unsupervised settings in a unified manner. Empirically, the representations learned using this principle alone are significantly more robust to label corruptions in classification than those using cross-entropy, and can lead to state-of-the-art results in clustering mixed data from self-learned invariant features.

  learning diverse and discriminative representation, maximal coding rate reduction